Optimal activation of halting multi‐armed bandit models
نویسندگان
چکیده
Abstract We study new types of dynamic allocation problems the Halting Bandit models. As an application, we obtain proofs for classic Gittins index decomposition result compare (Journal Royal Statistical Society, Series B, 1979, 41, 148–177), and recent results authors in Cowan Katehakis (Probability Engineering Informational Sciences, 2015, 29, 51–76).
منابع مشابه
The Irrevocable Multiarmed Bandit Problem
This paper considers the multi-armed bandit problem with multiple simultaneous arm pulls and the additional restriction that we do not allow recourse to arms that were pulled at some point in the past but then discarded. This additional restriction is highly desirable from an operational perspective and we refer to this problem as the ‘Irrevocable Multi-Armed Bandit’ problem. We observe that na...
متن کاملThe Nonstochastic Multiarmed Bandit Problem
In the multiarmed bandit problem, a gambler must decide which arm of K nonidentical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the trade-off between exploration (trying out each arm to find the best one) and exploitation (playing the arm believed to give the best payoff...
متن کاملFour proofs of Gittins' multiarmed bandit theorem
We study four proofs that the Gittins index priority rule is optimal for alternative bandit processes. These include Gittins’ original exchange argument, Weber’s prevailing charge argument, Whittle’s Lagrangian dual approach, and Bertsimas and Niño-Mora’s proof based on the achievable region approach and generalized conservation laws. We extend the achievable region proof to infinite countable ...
متن کاملA Lemma on the Multiarmed Bandit Problem
We prove a lemma on the optimal value function for the mdtiarmed bandit problem which provides a simple direct proof of optimality of writeoff policies. This, in turn, leads to a new proof of optimality of the index rule.
متن کاملMultiarmed Bandit Problems with Delayed Feedback
In this paper we initiate the study of optimization of bandit type problems in scenarios where the feedback of a play is not immediately known. This arises naturally in allocation problems which have been studied extensively in the literature, albeit in the absence of delays in the feedback. We study this problem in the Bayesian setting. In presence of delays, no solution with provable guarante...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Naval Research Logistics
سال: 2023
ISSN: ['1520-6750', '0894-069X']
DOI: https://doi.org/10.1002/nav.22145